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Abstract 

Given a sample from a discretely observed Levy process X = (X t )t>o 
of the finite jump activity, we study the problem of nonparametric es- 
timation of the Levy density p corresponding to the process X. Our 
estimator of p is based on a suitable inversion of the Levy-Khintchine 
formula and a plug-in device. The main result of the paper deals with 
an upper bound on the mean square error of the estimator of p at a 
fixed point x. We also show that the estimator attains the minimax 
convergence rate over a suitable class of Levy densities. 
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1 Introduction 



Recent years have witnessed a great revival of interest in Levy processes, 
which is primarily due to the fact that they found numerous applications in 
both theoretical and applied fields. The main interest has been in mathe- 
matical finance, see e.g. |19] for a detailed treatment and many references, 
however Levy processes obtained due attention also in queueing, telecom- 
munications, extreme value theory, quantum theory and many others. A 
thorough exposition of the fundamental properties of Levy processes can be 
found e.g. in 0, [2U] and [37]. 

It is well-known that Levy processes have a close link with infinitely di- 
visible distributions: if X = (Xt)t>o is a Levy process, then its marginal 
distributions are all infinitely divisible and are determined by the distri- 
bution of X&, where A > is an arbitrary fixed number. Conversely, 
given an infinitely divisible distribution /x, one can construct a Levy process 
X = (X t )t>o, such that Px A = fJ>, cf. Theorem 7.10 in [37]. Hence the law of 
the process X can be uniquely characterised by the characteristic function 
of X&, where A > is some fixed number. By the Levy-Khintchine formula 
for infinitely divisible distributions, the characteristic function of X& can be 
written as 

</>x A (t) = 

where the exponent V>A> called the characteristic or Levy exponent, is given 
by 

ip A (t) = Aijt - ^\a 2 t 2 + A / {e itx -l-itxlu x \<iAv(dx), (1) 
2 JR\{0} 

see Theorem 8.1 of |37| . Here 7 E R, a > and v is a measure concentrated 
on R\{0}, such that J* R > r Q i (lAx 2 )u(dx) < 00. This measure is called the Levy 

measure, while the triple (7, a 2 , v) is referred to as the characteristic or Levy 
triplet of X. The parameter 7 is called a drift parameter and a constant a 2 is 
a diffusion parameter. The representation in ([I]) in terms of the Levy triplet 
is unique. It then follows that the Levy triplet determines uniquely the law of 
any Levy process. Therefore, many statistical inference problems for Levy 
processes can be reduced to inference on the corresponding characteristic 
triplets. 

Until quite recently most of the existing literature dealt with parametric 
inference procedures for Levy processes, see e.g. pQ, [2], [3], [3], [7J, [8], 
[15] , [27], [33], [36] and [32]. However, a nonparametric approach is also 
possible and arises if one does not impose parametric assumptions on the 
Levy measure, or its density, in case the latter exists. A nonparametric 
approach can give e.g. valuable indications about the shape of the Levy 
density. Furthermore, parametric inference for Levy processes is complicated 
by the fact that for many Levy processes their marginal densities are often 
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intractable or not available in a closed form. This makes the implementation 
of such a standard method as the maximum likelihood method difficult. We 
refer to [TO], p], p2], [TO], [23], [26], [28], [33], [H] and references therein 
for a nonparametric approach to inference for Levy processes. 

In the present work we will assume that the Levy measure v has a finite 
total mass, i.e. < oo, and that it has a density p. In essence this means 
that the Levy processes that we sample from is a sum of a linear drift, 
a rescaled Brownian motion and a compound Poisson process. Thus this 
model is related to Merton's model of an asset price, see [3T]. Nonparametric 
inference for a similar model was already considered in [5] and [26J . Another 
work that deals with nonparametric estimation of the Levy density is [T7] . 
However, its model is different, as 7 = 0, a = is assumed and is not 
necessarily finite. 

Since in our case u(R) < 00, the Levy-Khintchine exponent can be rewrit- 
ten as 

iP A (t) = Ai-yt - A-a 2 t 2 + A / {e ltx - l)p{x)dx. (2) 
^ J -00 

Notice that 7's in ([I]) and (0) are in general different, but we use the same 
symbol for economy of notation. 

Suppose that the Levy process X = (X t )t>o is observed at discrete time 
instances A, 2A, . . . , reA with A kept fixed. By a rescaling argument, with- 
out loss of generality, we can take A = 1. Based on observations X±, . . . , X n , 
our goal is to estimate the density p. We will base an estimator of p on a suit- 
able inversion of (pXi • The idea of expressing the Levy measure or the Levy 
density in terms of 4>x ± and then replacing ^>X\ by its natural nonparamet- 
ric estimator, the empirical characteristic function, to obtain a plug-in type 
estimator for the Levy measure or the Levy density has been successfully 
applied in [5], [IT], [23], [26] . [33] and [41]. The logic behind this approach 
is that except of some particular cases, e.g. that of the compound Poisson 
process, see |10j and [H], an explicit relationship expressing the Levy mea- 
sure or its density directly in terms of the distribution of X\ is unknown. 
This hampers the use of a plug-in device, which is one of the most popu- 
lar and useful methods for obtaining estimators in statistics. On the other 
hand the Fourier approach allows one to cover a large class of examples, as 
shown in the above-mentioned papers. Notice that our model also shares 
many features characteristic of a convolution model with partially or to- 
tally unknown error distribution, see [13j . |18| . |30j and [32]. For instance, 
the Brownian components in X±, . . . , X n in our case will play a role similar 
to the measurement error in those papers, in case the latter has a normal 
distribution. 

We proceed to the construction of an estimator of p. First by differ- 
entiating the Levy-Khintchine formula we will derive a suitable inversion 
formula for p. Suppose that f R x 2 p(x)dx < 00. Since p has a finite second 
moment, so does X\ by Corollary 25.8 in (3TJ. Also E [|Xi|] is finite by the 
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Cauchy-Schwarz inequality. Hence we can differentiate <j>Xi with respect to 
t to obtain 



roo 

fa (*) = fa (t) [il-aH + i e itx xp(x)dx ) . (3) 



IX) 



Notice that differentiation of J^{e ltx — l)p{x)dx under the integral sign is 
justified by the dominated convergence theorem, applicable because of our 
assumptions on p. Next rewrite ([3]) as 



fa{t) 



i-y - a 2 t + i / e ltx xp(x)dx, (4) 
Ju 



which is possible, because (f>Xi(t) f° r an i € 1, see e.g. Theorem 7.6.1 
in |16j . Differentiating both sides of this identity with respect to t, we get 



fa{t)HAt) - (fa(t)f 
{fait)) 2 



/•oo 

-a 2 - / e itx x 2 p(x)dx, (5) 

J oo 



where again we interchanged the differentiation and integration order in the 
righthand side of to obtain the righthand side of ([5]) . Thus by rearranging 
the terms, we have 



oo 



a. 2 f w ( fait)) 2 ~ fait) fait) a 
e x p(x)dx = i — 1 a . 

ifa it)) 2 



Suppose that the righthand side is integrable, which is equivalent to the 
assumption that 4>p is integrable. Then by Fourier inversion 



If x ^ 0, this yields 

and we obtain a desired inversion formula. This formula coincides with the 
one given in [12^\. The formula has to be compared to related inversion 
formulae given in [T7], [33] and [4"T] . 

Denote Zj = Xj — Xj—i and observe that Z%, . . . , Z n are i.i.d., which 
follows from the stationary independent increments property of a Levy pro- 
cess. Let (p{t) = n _1 ]Cj=i e%tZj ■ By the strong law of large numbers, for 



[12] contains a more general result valid also for Levy densities with infinite total 
mass. However, the statement of the theorem in [12] mistakenly claims that the Levy 
density p is bounded under the assumptions given in [12] , In reality this can in general be 
ascertained only for x 2 p(x). Examples (e) and (f) considered in [T5] illustrate our point. 
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every fixed t, the empirical characteristic function </>(£) and its derivatives 
with respect to t, 4>'(t) and <j>"(t), converge a.s. to (px 1 (t), 4>' Xl (*) an d $X\ (*)> 
respectively. Using a plug-in device, a possible estimator of p(x) could be 
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where <r 2 is some estimator of a 2 . The problem with this "estimator" of p 
is that in general the integrand in ([7]) is not integrable. Furthermore, small 
values of (fr(t) in its tails may render the estimator numerically unstable, 
since (ft(t) appears in the denominator in (J7J). Therefore, as an estimator of 
p we propose the following modification of ([7]): 

Kx) ~f ( fg^l Gt - B^lo, - (8) 
2vrx 2 7_ 00 \(0(t)) 2 0(t) y 

Here 0^ denotes the Fourier transform of a kernel function w, while a 
number h > denotes a bandwidth. This terminology is borrowed from 
the kernel estimation theory, see e.g. [38]. The integral in ([8]) is finite 
under the assumption that <p w has a compact support, for instance on 
[—1,1], and an appropriate assumption on Gt- We define the latter set by 
Gt = { | I > K n e~^ ! /( 2ft - 2 )} . Hence depends on /i, as well as a constant 
E and a sequence K n of real numbers to be specified in the next section. At 
this point notice that we could have also used a diagonalised estimator 

2 \^ e itXj e itx k 

n(n — 1) ^— ' 

i<i<fc<n 

to estimate {(t>Xi{t)) 2 in the denominator of ([6]) and a similar diagonilised 
estimator to estimate (cj)' x ^(t)) 2 . An advantage of these two estimators is 
that they are unbiased estimators of (<^Xi(i)) 2 and (<^Xi(^)) 2 ' respectively, 
while {4>(t)) 2 and {<fi'{t)) 2 are not. On the theoretical side study of a possible 
modification of p would require the use of the theory of U-statistics, see e.g. 
Chapter 12 in However, since in the present paper we are mainly 

concerned with rates of convergence for estimation of p, we refrain from a 
study of this possible modification of p. 

It remains to propose an estimator of a 2 . To this end we use an estimator 
from [26 1 defined via 



r i/h 

a 2 = / max{min{M n ,log(|<^|)},-M n K(i)dt (9) 
J-l/h 



Here v h is a kernel function depending on h, while M n denotes a sequence 
of positive numbers diverging to infinity at a suitable rate. The estimator 
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is again based on the Levy-Khintchine formula and we refer to [26] for the 

heuristics of its introduction. 

If 4>w is symmetric and real- valued, then by taking a complex conjugate 
one can see that p is real-valued, because this amounts to changing the 
integration variable from t into — t in (JSj) - On the other hand, positivity of 
p is not guaranteed, which is a slight drawback often shared by estimators 
based on Fourier inversion and kernel smoothing. However, one can always 
consider p + {x) = max(/>(x), 0) instead of p(x). For this modified estimator 
we have E [(p + (x) — p(x)) 2 ] < E [(p(x) — p(x)) 2 ] and hence its performance 
is at least as good as that of p, if the mean square error is used as the 
performance criterion of an estimator. 

The structure of the paper is as follows: in the next section we will 
study the asymptotic behaviour of the mean square error of the proposed 
estimator of p and show that it is rate-optimal over a suitable class of Levy 
densities. The proofs of results from Section [2] are collected in Section [3j 

2 Results 

We first formulate conditions that will be used to establish asymptotic prop- 
erties of the estimator p. We also supply some comments on these conditions. 
Introduce a jump size density f(x) := p(x)/v(R). 

Condition 2.1. Let the unknown density p belong to the class 



where (3, L, K and A are strictly positive numbers. 

This condition is similar to the one given in [26] and we refer to the latter 
for additional discussion. When (3 is an integer, the integrability condition 
on (j)f is roughly equivalent to / having a derivative of order f3. The moment 
condition on / and consequently on p is admittedly strong, but on the other 
hand in mathematical finance it is customary to assume that p has a finite 
exponential moment. 

Condition 2.2. Let a be such that a G (0, £], where T, is a strictly positive 
number. 





i/(]R)/(x),/ is a probability density, 
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Although the case a = can also be handled, the truncation with lc t in 
that case will preclude our estimator from being rate-optimal. We concen- 
trate on the case a > 0, since it is more interesting from the point of view 
of applications. For the case when a = is known beforehand we refer to 
P2] and [23]. 

Condition 2.3. Let 7 be such that < T, where T is a positive constant. 

This condition is the same as the one in [26J, cf. also [5]. 

Condition 2.4. Let the bandwidth h depend on n and be such that h = 
{n log n)" 1 / 2 with < n < XT 2 /2. 

This condition is similar to the one given in [26]. Notice that in order 
to keep our notation compact, we suppress the dependence of h on n. On 
a more conceptual level, observe that in general a determines how fast the 
characteristic function (f>x 1 decays at plus and minus infinity. Thus the 
knowledge of S gives us a lower bound on the rate of decay of 4>x x - The 
fact that the bandwidth h depends on E has a parallel in the condition 
on the smoothing parameter in [T7], see Remark 4.2 there, and also arises 
in deconvolution problems with unknown error distribution, see [13] . As 
usual in kernel estimation, see e.g. p. 7 in [38], a choice of h establishes a 
trade-off between the bias and the variance of the estimator: too small an 
h will result in an estimator with small bias but large variance, while too 
large an h results in the estimator with large bias but small variance. From 
Theorems 12.31 and 12.41 it will follow that the choice of p as in Condition 12.41 
is optimal in a sense that it asymptotically minimises the order of the mean 
square error of the estimator p at a fixed point x. 

Condition 2.5. Let the kernel w be the sine kernel: w{x) = sinx/(7rx). 

The sine kernel has also been used in [26] when estimating the Levy 
density. Its use is also frequent in deconvolution problems, see e.g. |13j . The 
Fourier transform of the sine kernel is given by (j) w (t) = lr_ i t i](£)- 

Condition 2.6. Let the sequence K n be such that n n = ft(log(3 log(3n))) _1 
for a constant k > 0. 

This is a technical condition used in the proofs. The factor 3 under the 
logarithm sign is unimportant and is taken only to make k n positive for all n. 
The intuition behind Condition 12.61 is that up to a constant e _2A , e _s ^ 2h ) 
gives a lower bound on the modulus of the characteristic function 
on the interval [—h ,h ]. For n large enough, with an indicator lc t in 
the definition of p we thus cut-off those frequencies t, for which \4> e mp(t)\ 
becomes smaller than the lower bound for Other sufficiently slowly 

vanishing sequences {k n } can also be used. Of course conditions other than 
Condition 12.61 are also possible and we refer e.g. to [T7] for an alternative 
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truncation method in the definition of an estimator of a Levy density in a 
problem similar to ours. That particular truncation method is advantageous 
in the case a = 0. However, if we know beforehand that a > 0, it is natural to 
incorporate the knowledge of S in the selection of the threshold level in (J8J), 
since the knowledge of £ is required anyway when selecting the bandwidth 



Next we recall two conditions from [26] which were used to study the 
estimator a 2 . For the convenience of a reader we also state a result on the 
asymptotic behaviour of its mean square error. The latter is used in the 
proof of Theorem 12.31 below. 

Condition 2.7. Let the kernel v h (t) = h 3 v(ht), where the function v is 
continuous and real-valued, has a support on [—1, 1] and is such that 



Here (3 is the same as in Condition \2.1\ 

It is for simplicity of the proofs that we assume that the smoothing 
parameter h in the definition of a 2 is the same as in Condition 12.21 In 
practice the two need not be equal, although they have to be of the same 
order. 

Condition 2.8. Let the truncating sequence M = (M n ) n >i be such that 
M n = m n h~ 2 , where m n = loglog(3n). 

Theorem 2.1. Denote by T the collection of all Levy triplets satisfying 
Conditions EHO and assume Conditions \2.2\ \2. 7| and \2.£\ Let the esti- 
mator a 2 be defined by ([9]) . Then 



Even though Condition 12.11 differs slightly from its counterpart in [26] , 
this does not affect the proof of Theorem 12.11 Notice that had we not 
assumed i>(M.) < oo, there would not exist a uniformly consistent estimator 
of a 2 , see Remark 3.2 in [33]. In fact even the existence of a consistent 
estimator of a 2 is not clear in that general setting. 

Together with the above theorem, our main tool in studying the estima- 
tor p is the following maximal inequality for the empirical characteristic func- 
tion 4>{i) and its derivatives. Set ^°\t) = <ft(t) and likewise <j)^{t) = (jy^ (t). 

Theorem 2.2. Let k > and r > 1 be integers. Then as n — > oo and h — ► 

we have 



h. 




supE [(a 2 — a 
T 



2 ) 2 ]<(logn)^- 3 - 




sup 

tel-h- 1 ^- 1 } 





k+l \\r 



k\\r 



1 



(10) 



L 2V r(P) 



+ \\\x 



h r n r / 2 ' 
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provided \\\x\ ||i, 2Vr (p) is finite. Here the probability P refers to the law of 
■V|. 

The theorem constitutes a generalisation of the corresponding result for 
4> and r = 2 given in [2B]. The theorem is of possible general interest as well. 
For related results on the empirical characteristic function see Theorem 1 in 
[2T] and Theorem 4.1 in [33]. 

Equipped with the above two theorems, we are now ready to formulate 
the main result of the paper, which concerns the mean square error of the 
estimator p at a fixed point x ^ 0. Notice that we prefer to work with 
asymptotics uniform in Levy triplets, since existence of the super efficiency 
phenomenon in nonparametric estimation makes it difficult to interpret fixed 
parameter asymptotics, see e.g. [9] for a good discussion. This also explains 
why we imposed certain smoothness assumptions on the class of Levy den- 
sities: too large a class of densities, e.g. of all continuous densities, usually 
cannot be handled when dealing with uniform asymptotics, see e.g. Theorem 
1 on p. 36 in [22] for an example from probability density estimation. 

Theorem 2.3. Denote by T the collection of all Levy triplets satisfying 
Conditions \2.1l\2.3\ and assume Conditions \2.4\\2.8[ Let the estimator p be 
defined by jS]). Then we have 

supE[( /0n (x)- /0 (x)) 2 ]<(logn)- /3 
r 

for every fixed x ^ 0. 

Thus the convergence rate of our estimator turns out to be logarithmic, 
just as for the estimator proposed in [26J. This result can be easily under- 
stood on an intuitive level by comparison to a nonparametric deconvolution 
problem: if the distribution of the measurement error in a deconvolution 
model is normal, and if the class of the target densities is massive enough, 
e.g. some Holder class (see Definition 1.2 in [38]), the minimax convergence 
rate for estimation of an unknown density will be logarithmic for both the 
mean squared error and mean integrated squared error as measures of risk, 
see [23] and [25] . Of course the same holds true also for deconvolution mod- 
els with unknown error variance, see [13] and [30]. In fact we will prove that 
our estimator p attains the minimax convergence rate for estimation of the 
Levy density p at a fixed point x, when the risk is measured by the mean 
square error, see the theorem below. 

Theorem 2.4. Let T be a Levy triplet (7, o~ 2 ,p), such that \^/\ < T,a G 
(0, E],i/(R) G (0, A], where T,S and A are strictly positive constants. As- 
sume that 

/oo 
\tf\(p f (t)\dt < L (11) 
-oo 
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for constants (3 > and L > 0. Let T be a collection of all such triplets. 
Then for every fixed x we have 

inf supE[(p n (x) - p{x)) 2 ] > (logn)-' 3 , (12) 

Pn T 

where the infimum is taken over all estimators p n based on observations 
Xi, . . . , X n . 

We conclude this section by a comparison of p to the estimator p n pro- 
posed in [26j . Up to some additional truncation, the latter estimator is given 

by 

Pn(x) = ±- [ 1/h z- %tx Log f f f) . 2f2/2 ) dt, (13) 

27T J_y h ^ e l7i e -Ag-0- 2 t 2 /2y 

where Log denotes the so-called distinguished logarithm, i.e. a logarithm 
that is a continuous and single-valued function of t, see Theorem 7.6.2 of 
|16j for the definition. Furthermore, 7, A and a 2 are estimators of 7, u(M) 
and a 2 , respectively. Notice that in general the distinguished logarithm 
Log(g(i)) of some function g is not a composition of a fixed branch of an 
ordinary logarithm with g. The estimator p n seems to be given by a more 
complicated expression than p, because it depends explicitly on estimators of 
7 and u(K) in addition to the estimator of a 2 . Furthermore, the distinguished 
logarithm in (|13p can be defined only for those w's from the sample space f2 
for which ^ as a function of t does not hit zero on [— h , h^ 1 ]. For those u's 
for which this is not satisfied, p n has to be assigned an arbitrary value, e.g. 
one can assume that p n is a standard normal density. It is shown in [26] that 
the probability of the event that <f> hits zero for t in [— vanishes 
under appropriate conditions as n — » 00. However, an almost sure result of a 
similar type remains to be unknown. This seems to be a disadvantage of the 
estimator p n . On the other hand the estimator p is undefined for x = and 
a study of its asymptotic properties requires stronger moment conditions 
on the Levy density p. In conclusion, both estimators are rate-optimal, but 
each of them seems to have its own advantages over another. 



3 Proofs 



Proof of Theorem \2.2l The proof follows the same lines as the one in [13] , 
pp. 326-327 and [26J, pp. 334-335. We also seize an opportunity to correct 
some inaccuracies. 
We have 



E 



sup \ft k \t) 



4\V)i 



1 



r/2 



E 



sup \G n v t; k\ 
tel-h- 1 ^- 1 ] 
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where G n vt t k denotes an empirical process 



1 - 

G n v t ,k = —/= ^2(vt.k(Xj) - E[v t:k (Xj 
Vn ,=i 



)]) 



and the function Vt : k is defined as Vt t k '■ x i— > (ix) k e ltx . Introduce the functions 
v t,k,i '■ x l— ► x k cos(tx) and Vtf-2 '■ x i— > x k sin(tx). Since \i k \ = 1 and e %tx = 
cos(ta) + isin(tx), the c r -inequality gives 



E 



sup 



\G n v t . 



< E 



+ E 



sup 

Ael-h- 1 ^- 1 } 



sup 



\G n v t 



k,l\ 



\G n Vt,k,2\ 



Furthermore, by differentiability of & j with respect to t and the mean- 
value theorem we have 



Kfcj'O) - v S)k ,j(x)\ < \x 



fe+1 



t 



(14) 



for j = 1,2. Consequently, Vt t k,j is Lipschitz in t with a Lipschitz constant 
In what follows we will need some definitions and results from the 
theory of empirical processes. For all the unexplained terminology and no- 
tation we refer e.g. to Section 19.2 of [39] or Section 2.1.1 of [40J. First of 
all, by (|14|) and Theorem 2.7.11 of [3D] the bracketing number Nn of the 
class of functions ¥ n j (this refers to v^kj for t 6 [— and j = 1,2) 
can be bounded by the covering number N of the interval I n = [— h , h^ 1 ] 
as follows 

N {] (2e\\\x\ k+1 \\ u{Q y,F n , r ,U(Q)) < N(e;I n ;\ • |). 
Here Q is any probability measure. Since it is easily seen that 

iV(e|||x| fc+1 || L2(Q) ;F niJ ;L 2 (Q)) < N^2e\\\x\ k+1 \\ u{Q y,¥ n>f ,h 2 (Q)), 
cf. p. 84 in gnj, and 

N(e;I n ; 

we obtain that 



1 2 

< -T + l, 
e h 



N(e\\\x 



fc+ii 



1 2 



2(Q) ;F nj ;L 2 (Q)) <-- + !. 



(15) 



By taking s = 0, it follows from the definition of vt t k,j an d (|14f) that the 
function = can be used as an envelope for the class F n i, 
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while Fh2(x) = \x\ k+1 h 1 + \x\ k can serve as an envelope for F nj 2- Next 
define J(l,F nj ), the entropy of the class ¥ n j, as 

J(l,¥ nd ) =sup / {l + log(N(e\\F hj (x)\\ h2{Q y,¥ n ^U(Q)))} 1/2 de, 
Q JO 



where j = 1,2 and the supremum is taken over all discrete probability 
measures Q, such that ||i*/i,j(aO||wQ) > 0. Notice that F nj -'s are measurable 
classes of functions with measurable envelopes. Theorem 2.14.1 in [40J then 
implies that 



E 



sup \G n v t) k,j\ 
tel-h- 1 ^- 1 ] 



< 



ll^,i(^)llL 2 v,(P)( J ( 1 ' F ^)) r - 



Here the probability P refers to the distribution of X\ under the Levy triplet 
(7, cj 2 ,p). Observe that 

Moreover, we have 

ll^(^)IILw(p) < ^(IIM fe+1 Hwp) + Pl fc IILv,.(p))' 

provided h < 1. Here we also used the C2-inequality and an elementary 
inequality (a + b) l / r < a 1 / 7 ' + b l l r valid for positive a and b. 

It remains to bound the entropy J(l, F nj ). By the fact that H-F/i.iX^OHi^Q) 
/i~ 1 |||x| fe+1 ||£ 2 (Q) and taking e/i _1 instead of e in (fl5l) . we get 

N(e\\F hil (x)\\ u{Q) ;¥ nij ;-L 2 (Q)) < * + 1. (16) 

Furthermore, since \\F h ^{x) || L2(Q ) > \\\x\ k+1 h- l \\ h2 ^ Q) , by monotonicity of 
Nn in the size of the bracket combined with (|15p . we obtain that 

N(e\\F ht2 (x)\\ h2{Q) ;¥ n j-M(Q)) < * + 1. (17) 

Inserting the bounds from (|16p and (|17p into the definition of J(l, F n j), we 
see that 

1/2 



J(l,¥ n>j )<j |l + log0 + l^| de 



< 00. 

This yields the statement of the theorem. □ 
Proof of Theorem \2.3l By the C2-inequality we have 

E[(p(x) - p(x)) 2 } < \p(x) - p(x)\ 2 + E[\p(x) - p(x)\ 2 } = T 1 + T 2 , 
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where 



p(x) 



l/h 



e- itx cj) p (t)dt 



2vr J-x/u 



lux 2 



l/h 

i 

-l/h 



-itx 



a at. 



(fe(t)) 5 



By the fact that <f> p (t) = \<fif(t) and the Fourier inversion argument we can 
bound T\ as 





A 2 / 


T x < 


4vr 2 I 




A 2 / 




4vr 2 ^ 




A 2 ( 


< 


4vr 2 V 




A 2 L 2 


< 






4vr 2 ' 



\<t>f(t)\dt 



\tf\t\-^\</>f(t)\dt 



oo \ 2 

\tf\(j) f (t)\dt) h 2 ? 



provided that h < 1. Hence by Condition 12.21 sup-r- Ti is of order (logn) 13 . 
Furthermore, by the C2-inequality 



To < 



+ 



1 



4vr 2 x 4 
1 



i//i 
-l/h 



e~ itx dt 



EN<7 2 -<r 2 | 2 l 



E 



l/h 



e- Ux (^m Gt -HHt)))dt 



l/h 



4vr 2 x 4 
= T 3 + 

where for a function £ the mapping $ is defined by 

(C'(i)) 2 " C"(*)C(i) 



*(C(*)) 



(cm 



By Theorem 2.1 in combination with Condition 12.41 we have sup^-T^ < 
(logre) - ^™ 2 . Next notice that 



T d < 



1 



n 2 x 4 h 2 

7T 2 X 4 



E 



sup \$w))i Gt - $m))\ 

.tel-h- 1 ^- 1 } 
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Hence it remains to study T5 . This will be done via applications of Theorem 
First of all, the C2-inequality gives 



r 5 <^E 



sup 

tel-h- 1 ^- 1 } 



sup 

.tel-h- 1 ^- 1 } 



4>"(t) 1 Cg) 



(<M*)) 5 



= T 6 + T 7 . 

By another application of the C2-inequality we obtain 



sup 

Aei-h-^.h- 1 ] 



sup 

tel-h- 1 ^- 1 ] 



4>"(t) 

$(t) Gt Mt) Gt 



<t>x,{t) 



= T 8 + T 9 . 

The first summand in the last equality can be bounded as 



To < — E 

8 ~ h 2 



sup 

tel-h- 1 ^- 1 } 



sup 

jei-h-'t.h- 1 ] 



Ut) Gt fa® Gt 
4>"(t) fkS_ lG 



= T 10 + T n . 
Further bounding gives 



T 10 < 



sup \$"(t)\ 
te[-h-\h-^] 



sup 



'(t)-4>"xM , 

TTTTT" 7^— l G t 



h -\h-i] v m)\\Mt)\ 



Now apply the Cauchy-Schwarz inequality to the righthand side to obtain 



T10 < 



h 2 



E 



sup $'(t)\ 

* 6 [-h-ifc-l] 



1/2 



x E 



sup 



1/2 
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Observe that by the fact that |<p'(t)| < n 1 Y^j=\ and the C4-inequality 

4-1 

1 U 

n ^3 



< E 



3=1 



<-E 

n 



i=i 



2n4 



+ (E[^]) 



< (3v^) 4 4 4 / 2 ^E [(Z 2 - E [Z 2 ]) 4 ] + (E [Z 2 ]) 4 , 
n- 

where the last inequality follows from the Marcinkiewicz-Zygmund inequal- 
ity as given in Theorem 2 of [35J. By the Lyapunov inequality (E [Zf]) 4 < 
E [Zf]. This in combination with the C4-inequality gives E [(Z 2 — E [Zf]) ] < 
E[Z^]. It remains to bound E [Zf] uniformly in Levy triplets. The most 
direct way of doing this is to notice that 

E [Zf] = E [( 7 + aW + Y) 8 } < T 8 + S 8 E [W 8 ] + E [Y% 

where W is a standard normal random variable, while Y has a compound 
Poisson distribution with intensity u(M) and jump size density /. Observe 
that E [Y 8 ] = (f)y (0) and that under Condition 12.11 and with the Lyapunov 

(8) 

inequality it is laborious, though straightforward to show that <py (0) is 
bounded by a universal constant uniformly in Levy triplets. Hence sup-j- E [Z 8 ] 
is also bounded and then so is sup^- V^i2- As far as T13 is concerned, we 
have 



T 13 < e^E 



sup 



l^(*)-^x(*)l 



which follows from Conditions 12.11 and 12.21 Inequality ()10p with /c 
r = 4 then yields 



2 and 



Since 



Mp) 



Tl3< 



and 



1 3 1 1 4 



MP) 



+ 



|2||4 



1 



MP)^4 n 2- 



3 ||l 4 (p) are bounded by a constant uniformly in 
Levy triplets (this can be proved by essentially the same argument as we used 
for sup^- E [Z 8 ] above), it follows that sup^- X13 is negligible in comparison to 
(logn) - ^. This is also true for h~ 2 sup^- \/^i3 and then also for sup^-Tio- To 
complete the study of Tg, we need to study Tu. The latter can be bounded 
as follows: 



E 2 /h 2 



h 2 



-E 



v*6 



sup 

-/l-lft- 



l^(*)-^x(*)l 
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By the same reasoning as above one can show that sup-2-Tn is negligible 
compared to (logn) _/3 . Consequently, so is sup^-Tg. Next we deal with Tg. 
Notice that by our conditions and the Lyapunov inequality 



(t) 



x 2 p(x)dx 



~ h 2 ' 



Hence it holds that 



sup sup 



^(0 



~ /i 2 ' 



(18) 



Consequently, we have 



sup l G c 

te[-h- 1 ,h-' L ] 



We study the expectation on the righthand side. First of all, for t G 
[— /i — j and all n large enough we have 

G\ = - \4> Xl (jk)\ < Kne-*'^ - \<P Xl (t)\} 

= {\<PxM - \m > \<t>xM - ^e~ s2/(2,l2) } 



-S 2 /(2/i 2 )| 



C < sup 
{tel-h-^^- 1 } 

= G*. 



^(t)-0(t)|>(e- 2A -K n )e- s2 /^ 2 ) 



Therefore sup^r^-i^-ii 1g° < 1g* and then by Chebyshev's inequality we 
obtain 



1 e E2 /^ 2 



E 



sup \4> Xl (t) - (f)(t)\ 



Next apply (jlOp with = and r = 2 to the expectation in the rightmost 
inequality to conclude that sup^-Tg is negligible in comparison to (logn) _/3 . 
This shows that also sup^-Tg is negligible in comparison to (logn) _/3 . To 
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complete bounding T5 and eventually T4, we need to bound T7. By the 
C2 -inequality 



T 7 < E 



< E 



sup 

.tel-h- 1 ^- 1 ] 



sup 

.tel-ft- 1 ^- 1 ] 



fe(i)) 2 



1G5 



(0(i)) 2 



Gt (<PxAt)) 2 Gt 



— Ti4 + T15. 

Observe that since for /i-*Owe have 

(<^W) 2 



sup 

T 



which can be shown by the same arguments that led to (|18p . we also have 
T14 < /i _4 P(G*). It then follows that sup^-Ti4 is negligible in comparison 
to (logn) _/3 . We turn to T15. By the C2-inequality 



Tl5<E 



+ E 



sup 

te\-h-\ h-il 



sup 



, (flfr (*))' , 



— Tie + 117. 
Notice that by the Cauchy-Schwarz inequality 



T w <E 



E 



sup |(0'(t)) 2 | sup 
.ief-Zi- 1 ^- 1 ] tef-h- 1 ,^- 1 ] 



sup sup 

j&l-h- 1 ^- 1 ] tel-h- 1 ^- 1 ] 



|(0 Xl (t)) 2 -(0(i)) 2 



iGt 



< E 



sup |(0'(i)) 2 | 
.tel-fe- 1 ^- 1 ] 



1/2 



I E 

^18 yTxg. 



sup 



f lfe(t)) 2 -(^)) s 

I \m 2 \<t>xAt)\ 2 



"lGt 



1/2 
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Since \4>(t)\ < n Y^=i ^ follows that the term Tig is bounded by 
E [(n _1 X)Li l^jl) 8 ]- By the c 8 -inequality we then get 



E 



< ^rE 



E(I^I- E [I^ID 
J'=l 



+ (E[|Z,|]) 8 . 



Hence sup-j-Tig is bounded by a constant, which is proved by the same 
argument as we used for sup^-T^. Finally, we consider T19. We have 



^19< 



A-* 



-E 



sup \4>(t) - (j) Xl (t)\ 
te[-h-\h-i] 



because 

\(Mt)) 2 -(m 2 \<2\cf> Xl (t)-m 

and (t)\ is bounded from below by e -2A-E 2 /(2fr 2 ) f or ^ g [— /i -1 ]. Us- 
ing (|1U|) . we conclude that sup^-Tig is negligible in comparison to (logn) - ^. 
Hence so is sup^-T^. It remains to study T\j. Since 

t \ 2" 



r i7 < e^E 



sup 

* 6 [_h-ij,-i] 



it follows from (jlOp and Condition 12.41 that sup^- T17 is negligible in compar- 
ison to (logra) - ^. Consequently, so are sup^-Tis and sup^- T7. Combination 
of all the above results completes the proof of the theorem. □ 

Proof of Theorem \2.4\ The statement of the theorem is for estimators based 
on observations X\, . . . ,X n , but the relationship Zj = Xj — and the 

stationary independent increments property of a Levy process allows us to 
work with Z\, . . . , Z n instead. We adapt the proof of Theorem 4.1 in to 
the present case. A general idea of the proof is as follows: we will consider 
two Levy triplets T\ = (0, <7 2 ,pi) and T2 = (0, ct 2 ,^) depending on n and 
such that the Levy densities p\ and P2 are separated as much as possible 
at a point x, while at the same time the corresponding product densities 
qf n and qf n of observations Z\,...,Z n are close in the x 2 -divergence and 
hence cannot be distinguished well using the observations Z\ , . . . , Z n . Up 
to a constant, the squared distance between p\{x) and P2{x) will then give 
the desired lower bound (|12p for estimation of a Levy density p at a fixed 
point x. This is a standard technique and we refer to Chapter 2 of [35] for 
a good exposition of methods for deriving lower bounds in nonparametric 
curve estimation. 

Consider two Levy triplets T\ = (0,cr 2 ,pi) and T2 = (0,a 2 ,p2), where 
pj(u) = u(R)fj(u) for j = 1,2 and constants < u(R) < A and < a 2 < S 2 . 
Let i 

/lH = o( r l( u ) + r 2(»), 
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where two densities r\ and r 2 are denned through their characteristic func- 
tions as follows: 



OO 1 

e ~ itu M 

(l + i2/ j g2)(A+l)/2 ar ' 



ri{u) = kL 

r 2 (u) = — / e- itu e- ai ^ a2 dt 

27r J -co 



With a proper selection of 0i, f3 2 , oti and a 2 one can achieve that fa satisfies 
pip with a constant L/2 instead of L. We also assume that 1 < a 2 < 2. 
Next define fa by 

/ 2 (n) = / 1 (n)+^ J ff((n-x)/<5„), 

where <5„ , —* as n — > oo, and the function H satisfies the following condi- 
tions: 

1. H(0) > 0; 

2- fZ\t\^H(t)\dt < L/2; 

3. /_ oo oo ^(x)^ = 0; 

4. J°_ oo H(x)dx^0; 

5. <£ H (t) = for t outside [1,2]; 

6. 4>ji(t) is twice continuously differentiable. 

Since fi(u) decays as |n|~ 1_Q2 at infinity, see formula (14.37) in [37], with 
a proper selection of H, e.g. by the reasoning similar to the one on p. 1268 
in [23], the function fa will be nonnegative, at least for all small enough 6 n . 
Consequently, fa will be a probability density that satisfies (fTT|) . 
Now notice that 

\p 2 {x) - Pl {x)\^ 5f . (19) 

The statement of the theorem will follow from (|19p and Lemma 8 of [J3], if 
we prove that for 5 n x (logn) -1 / 2 we have 

nx (92,9i) = n / ri du < c, 20 

where a positive constant c < 1 is independent of n. Here x 2 (~r) denotes 
the x 2 -divergence, see p. 86 in [38] for the definition. 

Denote by pi a density of a Poisson sum Y = J2f=i m Wj conditional 
on the fact that its number of summands N{v(M)) > 0. Here Wj are i.i.d. 
with W\ ~ fa. Now rewrite the characteristic function of Y as 

<f> Y (t) = + (1 - e~ v ^)-^ [>(«)**(*) - l) , (21) 
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to see that 

m*) = («^** w - 1 

Furthermore, 

oo 

Pi( u ) = f: n (u)P(N(v(R)) = n\N(u(R)) > 0). (22) 

n=l 

By convolving the law of Y with a normal density <f) a 2 with mean zero and 
variance a 2 and using (|2ip . we obtain that 



gi(«)>(i-e-"«)to )0a 

Since by Lemma 2 of [T3] there exists a large enough constant A, such that 
the right-hand side of the above display is not less than (l—e~ u ^)pi(\u\+A), 
we have 

BX » talSl) < „ r w)-«y A < „ r (»m-*m>V 

The last inequality is true because by (f2"2"j) it holds that + vl) > 

+ A). Splitting the integration region in the rightmost term of the last 
display into two parts, we get that 

nx 2 (Q2, qi)^n / (92 C") - qi(u)) 2 du + n u 4 (q 2 (u) - qi(u)fdx 

J\u\<A J\u\>A 

= Tx + r 2 . 

Here we used the facts that fi(u) decays as |-u| _1_a2 at infinity and that 
1 < a.2 < 2. Now notice that 

00 

e itu 5^H((u - x)/5 n )dx = ^ + V te <fe(<5 n t). 

-00 

Parseval's identity then gives 



CO 

u(R)\2 r-oo 



n { - — -L I |0 p2 ( t ) _ <t> pi (t)\'e-^'dt 



00 



( e ^(K) _ 1)2 2vr 7_ c 



00 

2 -IT 4 * 2 



<n / l^^-^^re-^ eft, 
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where the last inequality is a consequence of the mean- value theorem applied 
to the function e x and the fact that \v(R)({)f.(t)\ < A < oo. By definition of 
fi and /2 it follows that 

/oo 
\^ H (S n t)\ 2 e~^ t2 dt 
-oo 

/oo 
\<f> H (s)\ 2 e-° 2s2 ^ds 
-oo 

= O (n5f + 1 e-° a ^) . 

Hence a choice 5 n x (logn) -1 / 2 with an appropriate constant will imply 
that T\ — > as n — > oo. 

To complete the proof, we need to show that T2 — ► under a suitable 
condition on <5 n . To this end first notice that even though 4>f x and 0j 2 are 
not twice differentiable at zero, the difference 4> q2 (t) — 4> qi (t) still is, because 
4>h is identically zero outside the interval [1, 2]. Then by Parseval's identity 
we obtain that 

T 2 <n- J_J{4> q ,{t)-4> qi {t))"\ 2 dt. 

By the same arguments as we used for T\, one can show that T2 — > as 
n — > 00, provided <5 n x (logn) -1 / 2 with an appropriate constant. This 
entails the statement of the theorem. □ 
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